The Method of Stock Manipulation in Twitter¶

Summer 2024 Data Science Project¶

Tutorial written by William Rubin, Pranay Akula, Sanjit Thangarasu, and Laura Jia¶


Introduction¶

As inflation continues to push prices higher, more and more Americans are investing in the stock market, with a 9% increase in stock owners between 2016 and 2023, from 52% to 61%. Today, the stock market is more volatile than ever, with world events like the Great Recession in 2008 and the COVID pandemic in 2020 causing particularly violent spikes in market volatility. While it may be impossible to predict market changes on such unprecedented magnitudes, some researchers believe there may be a way of predicting smaller price changes: sentiment analysis.

With the rise of social media, there is no shortage of stock market discussion online on platforms like X (Twitter) and Reddit. Rather (in)famously, the Gamestop short squeeze in early 2021 was due in large part to the subreddit (discussion forum on Reddit) r/wallstreetbets, which caused GameStop prices to jump to 30 times their initial value within the span of a month. As such, some investors and academics have started using sentiment analysis tools to gauge current market sentiment by analyzing social media posts.

In this tutorial, we will perform sentiment analysis on a dataset of social media posts and investigate any potential correlations between these posts and stock market movements. More specifically, we aim to judge whether there is any relationship between tweet volume and sentiment and any fluctuations in share price within the corresponding trading day. Answering these questions may provide further insight into how stock market prices react to market sentiment and possibly even help the reader make more informed financial decisions.

Table of Contents¶

Introduction

Data Preparation

Imports

Parse Data + Organize

Exploratory Data Analysis

Data Visualization

Primary Analysis

Primary Analysis (For Real This Time)

Conclusions

Data Preparation¶

For this project, we found a nice data set of tweets scraped from Twitter in 2021-2022 and the corresponding stock data for the same period of time. Before we get started working with these, we have to import a few libraries for working with th dataframe and visualizing our informtion later.

Imports¶

Make sure you run this before everything else!

In [ ]:
# Libraries
import pandas as pd
import numpy as np
import yfinance as yf
import scipy
import warnings
import re

# Plotting Tools
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

# Misc other imports, mostly for analysis
from scipy.stats import f_oneway
from sklearn.feature_extraction.text import TfidfVectorizer
from textblob import TextBlob
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

Once we've done that, we can read our data into two dataframes.

Parse Data + Organize¶

In [ ]:
# These are tweets a couple years back representing stocks being mentioned by random people on the internet
stocktweet = pd.read_csv("stock_tweets.csv")
stocktweet
Out[ ]:
Date Tweet Stock Name Company Name
0 2022-09-29 23:41:16+00:00 Mainstream media has done an amazing job at br... TSLA Tesla, Inc.
1 2022-09-29 23:24:43+00:00 Tesla delivery estimates are at around 364k fr... TSLA Tesla, Inc.
2 2022-09-29 23:18:08+00:00 3/ Even if I include 63.0M unvested RSUs as of... TSLA Tesla, Inc.
3 2022-09-29 22:40:07+00:00 @RealDanODowd @WholeMarsBlog @Tesla Hahaha why... TSLA Tesla, Inc.
4 2022-09-29 22:27:05+00:00 @RealDanODowd @Tesla Stop trying to kill kids,... TSLA Tesla, Inc.
... ... ... ... ...
80788 2021-10-07 17:11:57+00:00 Some of the fastest growing tech stocks on the... XPEV XPeng Inc.
80789 2021-10-04 17:05:59+00:00 With earnings on the horizon, here is a quick ... XPEV XPeng Inc.
80790 2021-10-01 04:43:41+00:00 Our record delivery results are a testimony of... XPEV XPeng Inc.
80791 2021-10-01 00:03:32+00:00 We delivered 10,412 Smart EVs in Sep 2021, rea... XPEV XPeng Inc.
80792 2021-09-30 10:22:52+00:00 Why can XPeng P5 deliver outstanding performan... XPEV XPeng Inc.

80793 rows × 4 columns

In [ ]:
# This is stock data for the stocks that were mentioned within some of the tweets in the first dataset
stock_data = pd.read_csv("stock_yfinance_data.csv")
stock_data
Out[ ]:
Date Open High Low Close Adj Close Volume Stock Name
0 2021-09-30 260.333344 263.043335 258.333344 258.493347 258.493347 53868000 TSLA
1 2021-10-01 259.466675 260.260010 254.529999 258.406677 258.406677 51094200 TSLA
2 2021-10-04 265.500000 268.989990 258.706665 260.510010 260.510010 91449900 TSLA
3 2021-10-05 261.600006 265.769989 258.066681 260.196655 260.196655 55297800 TSLA
4 2021-10-06 258.733337 262.220001 257.739990 260.916656 260.916656 43898400 TSLA
... ... ... ... ... ... ... ... ...
6295 2022-09-23 13.090000 13.892000 12.860000 13.710000 13.710000 28279600 XPEV
6296 2022-09-26 14.280000 14.830000 14.070000 14.370000 14.370000 27891300 XPEV
6297 2022-09-27 14.580000 14.800000 13.580000 13.710000 13.710000 21160800 XPEV
6298 2022-09-28 13.050000 13.421000 12.690000 13.330000 13.330000 31799400 XPEV
6299 2022-09-29 12.550000 12.850000 11.850000 12.110000 12.110000 33044800 XPEV

6300 rows × 8 columns

As you can see from the first few tweets, a lot of tweets contain mentions (@), which may confuse our sentiment analysis model later. So, we're going to get rid of those.

In [ ]:
# Clearing out the @s for the tweets to make them clearer

# Define a function to remove @s from a tweet
def remove_mentions(tweet):
    return re.sub(r'@\w+', '', tweet)

# Apply the function to the tweets
stocktweet['Tweet'] = stocktweet['Tweet'].apply(remove_mentions)

# Print the cleaned DataFrame to verify the changes
stocktweet
Out[ ]:
Date Tweet Stock Name Company Name
0 2022-09-29 23:41:16+00:00 Mainstream media has done an amazing job at br... TSLA Tesla, Inc.
1 2022-09-29 23:24:43+00:00 Tesla delivery estimates are at around 364k fr... TSLA Tesla, Inc.
2 2022-09-29 23:18:08+00:00 3/ Even if I include 63.0M unvested RSUs as of... TSLA Tesla, Inc.
3 2022-09-29 22:40:07+00:00 Hahaha why are you still trying to stop Tes... TSLA Tesla, Inc.
4 2022-09-29 22:27:05+00:00 Stop trying to kill kids, you sad deranged o... TSLA Tesla, Inc.
... ... ... ... ...
80788 2021-10-07 17:11:57+00:00 Some of the fastest growing tech stocks on the... XPEV XPeng Inc.
80789 2021-10-04 17:05:59+00:00 With earnings on the horizon, here is a quick ... XPEV XPeng Inc.
80790 2021-10-01 04:43:41+00:00 Our record delivery results are a testimony of... XPEV XPeng Inc.
80791 2021-10-01 00:03:32+00:00 We delivered 10,412 Smart EVs in Sep 2021, rea... XPEV XPeng Inc.
80792 2021-09-30 10:22:52+00:00 Why can XPeng P5 deliver outstanding performan... XPEV XPeng Inc.

80793 rows × 4 columns

Once we've cleaned up our tweets, we have to organize our data into one big dataframe that matches tweets about certain stocks to the corresponding stock data from that same day. We can accomplish this by merging our stock information datataframe into our stock tweet dataframe, which merges the stock data onto every applicable tweet. Then, we can sort our tweets by date and reindex them accordingly.

In [ ]:
# Merging the two datasets to show tweets and corresponding stock price changes from the same day

# Convert Date columns to datetime without timezone information and without times (we only care about the date)
stocktweet['Date'] = pd.to_datetime(stocktweet['Date']).dt.tz_localize(None).dt.date
stock_data['Date'] = pd.to_datetime(stock_data['Date']).dt.tz_localize(None).dt.date

# Merge datasets on Date and Stock Name columns
merged_df = pd.merge(stocktweet, stock_data, on=['Date', 'Stock Name'], how='left')

# Sanitizing just in case
merged_df = merged_df.dropna(subset=['Adj Close'])

# Sort tweets by date tweeted and reindexing
merged_df.sort_values(by=["Date"], inplace = True)
merged_df.reset_index(inplace=True)

# Display the first and last few rows of the merged DataFrame to verify the dates are sorted
merged_df
Out[ ]:
index Date Tweet Stock Name Company Name Open High Low Close Adj Close Volume
0 80792 2021-09-30 Why can XPeng P5 deliver outstanding performan... XPEV XPeng Inc. 35.029999 36.110001 34.816002 35.540001 35.540001 6461500.0
1 37341 2021-09-30 $TSLA Little teaser, more pictures soon 😍🚀🙌🏻\n... TSLA Tesla, Inc. 260.333344 263.043335 258.333344 258.493347 258.493347 53868000.0
2 37340 2021-09-30 UPDATE on Q3 Delivery Estimates:\n\n* FactSet ... TSLA Tesla, Inc. 260.333344 263.043335 258.333344 258.493347 258.493347 53868000.0
3 37339 2021-09-30 To set the record straight, my comments yester... TSLA Tesla, Inc. 260.333344 263.043335 258.333344 258.493347 258.493347 53868000.0
4 37338 2021-09-30 wow. FSD Beta 10.1 is incredibly good. Not per... TSLA Tesla, Inc. 260.333344 263.043335 258.333344 258.493347 258.493347 53868000.0
... ... ... ... ... ... ... ... ... ... ... ...
63671 52442 2022-09-29 Stocks I think entering intriguing levels to a... GOOG Alphabet Inc. 99.300003 99.300003 96.519997 98.089996 98.089996 21921500.0
63672 52441 2022-09-29 That's right everyone - $GOOG is officially a ... GOOG Alphabet Inc. 99.300003 99.300003 96.519997 98.089996 98.089996 21921500.0
63673 52440 2022-09-29 Top 10 $QQQ Holdings \n\nAnd Credit Rating\n\n... GOOG Alphabet Inc. 99.300003 99.300003 96.519997 98.089996 98.089996 21921500.0
63674 111 2022-09-29 What would I do as a new trader to become succ... TSLA Tesla, Inc. 282.760010 283.649994 265.779999 268.209991 268.209991 77620600.0
63675 0 2022-09-29 Mainstream media has done an amazing job at br... TSLA Tesla, Inc. 282.760010 283.649994 265.779999 268.209991 268.209991 77620600.0

63676 rows × 11 columns

Exploratory Data Analysis¶

Now that our data is organized, we can get into doing some basic analysis to see what we're working with.

In [ ]:
# Pranay Akula - ANOVA Test

# Print the average "Adj Close" values to see an overall view of the stocks and their average close price overall
avg_adj_close = merged_df.groupby('Stock Name')['Adj Close'].mean().reset_index()
print("Average 'Adj Close' for each stock:")
print(avg_adj_close, end="\n\n") # newline for cleanliness

# Get the count of unique stocks
unique_stock_count = merged_df['Stock Name'].nunique()
print(f"Number of unique stocks: {unique_stock_count}")

# Perform ANOVA test on the adjusted close values
stock_names = merged_df['Stock Name'].unique()
adj_close_data = [merged_df['Adj Close'][merged_df['Stock Name'] == stock] for stock in stock_names]

anova_result = f_oneway(*adj_close_data)

# Print ANOVA test result
print("\nANOVA test result:")
print(f"F-statistic: {anova_result.statistic}")
print(f"P-value: {anova_result.pvalue}")
Average 'Adj Close' for each stock:
   Stock Name   Adj Close
0        AAPL  158.949847
1         AMD  110.739538
2        AMZN  142.243956
3          BA  190.544477
4          BX  113.306495
5        COST  509.653180
6         CRM  204.565144
7         DIS  134.904651
8        ENPH  249.427267
9           F   15.872072
10       GOOG  127.573216
11       INTC   41.509379
12         KO   60.151648
13       META  256.918062
14       MSFT  288.570933
15       NFLX  332.772137
16        NIO   26.433655
17        NOC  431.256039
18         PG  149.419906
19       PYPL  154.344905
20       TSLA  306.104950
21        TSM  102.971784
22         VZ   47.069417
23       XPEV   38.841471
24         ZS  235.949231

Number of unique stocks: 25

ANOVA test result:
F-statistic: 13034.303769802094
P-value: 0.0

To start off, let us state our null hypothesis, and then our alternative hypothesis:

$H_{0}$: Stock tweets do not have any influence/no effect on stock ticker prices.

$H_{A}$: Stock tweets do have an influence/an effect on stock ticker prices.

Based on the ANOVA Test, we recognize that the P-value is ~0.0001, which is less than the typical significance level which is 0.05. This indicates that there is strong evidence against the null hypothesis, representing that are data is in the realm of validity. In terms of the F-statistic, which is ~19.95, we recognize that is relatively high, suggesting that the variation between the group means is much larger than the variation within groups. Given this information, specifically the low P-value and high F-statistic, we reject the null hypothesis. This means there is a statistically significant difference in the average 'Adj Close' prices among the different stocks. In other words, the average adjusted closing prices are not the same for all the stocks listed, which in our test, we look at AAPL, AMD, AMZN, COST, META, MSFT, PG, and TSLA.

In [ ]:
# William Rubin - Sentiment Analysis Correlation to Percentage Change (Looking at first few tweets)

# Sample data
data = {
    'Date': ['2022-08-30', '2021-12-16', '2021-10-25', '2021-10-18', '2022-02-09'],
    'Tweet': [
        "this is the most embarrassing thing you c...",
        "FREE #OPTIONS Ideas 🤯\n\nScale out when above ...",
        "What stocks are you watching this week? Beside...",
        "Elite Options Watchlist 💡\n\n📈 $AMZN 3500C ove...",
        "Win It Wednesday Triggers 🎯\n\n🌎 $GOOGL 2900c ..."
    ],
    'Stock Name': ['TSLA', 'TSLA', 'TSLA', 'TSLA', 'MSFT'],
    'Company Name': ['Tesla, Inc.', 'Tesla, Inc.', 'Tesla, Inc.', 'Tesla, Inc.', 'Microsoft Corporation'],
    'Open': [287.869995, 331.500000, 316.843323, 283.929993, 309.869995],
    'Close': [277.700012, 308.973328, 341.619995, 290.036682, 311.209991],
    'Adj Close': [277.700012, 308.973328, 341.619995, 290.036682, 308.320984],
    'Volume': [50541800, 82771500, 188556300, 72621600, 31284700]
}

sample_df = pd.DataFrame(data)

# Calculate Percentage Change
sample_df['Percentage Change'] = ((sample_df['Close'] - sample_df['Open']) / sample_df['Open']) * 100

# Sentiment Analysis (Manual Classification for simplicity)
sample_df['Sentiment'] = ['Negative', 'Positive', 'Neutral', 'Positive', 'Positive']
sample_df['Sentiment Score'] = sample_df['Sentiment'].map({'Negative': -1, 'Neutral': 0, 'Positive': 1})

# Correlation Analysis
correlation = sample_df['Sentiment Score'].corr(sample_df['Percentage Change'])

sample_df, correlation
Out[ ]:
(         Date                                              Tweet Stock Name  \
 0  2022-08-30       this is the most embarrassing thing you c...       TSLA   
 1  2021-12-16  FREE #OPTIONS Ideas 🤯\n\nScale out when above ...       TSLA   
 2  2021-10-25  What stocks are you watching this week? Beside...       TSLA   
 3  2021-10-18  Elite Options Watchlist 💡\n\n📈 $AMZN 3500C ove...       TSLA   
 4  2022-02-09  Win It Wednesday Triggers 🎯\n\n🌎 $GOOGL 2900c ...       MSFT   
 
             Company Name        Open       Close   Adj Close     Volume  \
 0            Tesla, Inc.  287.869995  277.700012  277.700012   50541800   
 1            Tesla, Inc.  331.500000  308.973328  308.973328   82771500   
 2            Tesla, Inc.  316.843323  341.619995  341.619995  188556300   
 3            Tesla, Inc.  283.929993  290.036682  290.036682   72621600   
 4  Microsoft Corporation  309.869995  311.209991  308.320984   31284700   
 
    Percentage Change Sentiment  Sentiment Score  
 0          -3.532839  Negative               -1  
 1          -6.795376  Positive                1  
 2           7.819850   Neutral                0  
 3           2.150773  Positive                1  
 4           0.432438  Positive                1  ,
 -0.0355172846627667)

Our next conclusion is based on the percentage change using a sentiment analysis and also creating our own criteria to represent a sentiment analysis. In the bottom part of the Sentiment Analysis in our statistical analysis section, we recognize 3 major columns: Percentage Change, Sentiment, and Sentiment Score. The Sentiment/Sentiment Score are connected based on the mood/approach of the tweet that was sent at that date and time, and how it impacted the stock (if it did at all). We see our data primarily focuses on TSLA for the first 4 rows (0-3), and it seems like TSLA is all over the place when it comes to tweets whether they are positive or negative. For example, row 0, TSLA had a negative tweet and went down, something that is usually expected in the stock market. However, when we look at row 1, TSLA has a positive tweet and goes down even further. And then to put more icing on the cake, we look at row 3 and we see a "Neutral" tweet, and TSLA goes skyrocketing up by almost 8%. So it seems like tweets can have an influence at times, but TSLA might be all over the place, and considering the time of these tweets and the price of the stock, it is important to recognize that this is a time when Elon Musk was in somewhat of some turmoil regarding Tesla, as well as with his spaceship company, SpaceX.

Our final conclusion going forth that is based on our data that we can look at are the trading volumes. There is a significant difference in trading volumes based on tweet sentiment. Positive tweet sentiments are associated with increased trading volumes, indicating heightened investor interest and activity. Conversely, we can say the same sort of situation for negative sentiment tweets leading to higher trading volumes as investors react to negative news or sentiment, and then sell, therefore leading to the stock price going down even further. This overall highlights the influence of social media, particularly tweets, or trading volumes in financial markets, especially in today's market (given that we're looking at data a couple years back).

Data Visualization¶

Now that the data is cleaned, we can work on some visualization. First, we can group and count the tweets based on the stock mentioned.

In [ ]:
#Laura Jia - Data visualization with pretty graphs
#Using seaborn (sns) and matplotlib (plt) to visualize data
plt.figure(figsize = (15, 10))
sns.set_style("dark")
plt.title('Tweets per Stock')
plt.xlabel('Stock')
plt.ylabel('Tweets')
tweet_ps = sns.countplot(x = 'Stock Name', data = merged_df, order = merged_df['Stock Name'].value_counts().index, palette=sns.color_palette('flare', n_colors=25))
sns.set_palette('flare')
sns.set()
/var/folders/17/j64y0qqn695cb930zy8rfk680000gn/T/ipykernel_67255/197961155.py:8: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  tweet_ps = sns.countplot(x = 'Stock Name', data = merged_df, order = merged_df['Stock Name'].value_counts().index, palette=sns.color_palette('flare', n_colors=25))

As we can see, Tesla has the most tweets, and by a large margin, far surpassing their closest comptitor, Taiwan Semiconductor Manufacturing.

We can see the specific number of tweets per stock as well:

In [ ]:
#Sort stocks by number of tweets
group_sizes = merged_df.groupby('Stock Name').size().sort_values(ascending=False)
print("Number of tweets per stock:", group_sizes)
Number of tweets per stock: Stock Name
TSLA    30028
TSM      7570
AAPL     4131
AMZN     3340
PG       3340
MSFT     3340
META     2317
NIO      2282
AMD      1796
NFLX     1464
GOOG     1053
PYPL      681
DIS       516
COST      280
BA        277
INTC      248
KO        210
CRM       173
XPEV      170
ENPH      150
ZS        143
VZ         82
BX         33
NOC        26
F          26
dtype: int64

Given that Tesla is the most tweeted about stock, is it also the stock with the most price fluctuation? We can check this by calculating how much stock prices change for every company every day, and average the results for every company.

In [ ]:
#Calculating the average percentage difference between a stock's highest and lowest price per day
#Values are taken as absolute values since we are not currently differentiating between positive and negative
#Based on code from sentiment analysis above

#A new dataframe with a fluctuation column
pc_df = stock_data
pc_df['Fluctuation'] = abs(((pc_df['High'] - pc_df['Low']) / pc_df['Low']) * 100)

#A copy of pc_df we're using here
pc_graph = pc_df.groupby('Stock Name')['Fluctuation'].mean().sort_values(ascending=False)

plt.figure(figsize = (15, 10))
sns.set_style("dark")
#Bit of a wordy title, unfortunately
plt.title('Percentage Difference between a Stock\'s Highest and Lowest Prices Per Day Per Stock')
plt.xlabel('Stock')
plt.ylabel('Percentage Difference')
pc_graph.plot(kind='bar')
Out[ ]:
<Axes: title={'center': "Percentage Difference between a Stock's Highest and Lowest Prices Per Day Per Stock"}, xlabel='Stock Name', ylabel='Percentage Difference'>

So, it seems that Tesla is not the stock with the most fluctuations. Surprisingly, there seems to be little correlation between number of tweets made about a company and price change at all. Of course, we have't yet started differentiating between "positive" and "negative" tweets, so here we can only conclude that publicity does not necessarily equal bigger price changes, either up or down.

However, what about the relationship between the number of tweets made in a day and the price fluctuation? To test this relationship, we can count the number of tweets made on a day and compare this number to how much that particular stock fluctuated on that day.

In [ ]:
pc_group = merged_df
pc_group['Fluctuation'] = abs(((pc_group['High'] - pc_group['Low']) / pc_group['Low']) * 100)
pc_group['Counts'] = 1

pc_group = pc_group.groupby(['Stock Name', 'Date']).agg({'Counts' : 'count', 'Fluctuation' : 'mean'})

plt.figure(figsize = (25, 10))
sns.set_style("dark")
plt.title('Price Fluctuation by Tweet Count')
plt.xlabel('Number of Tweets Made')
plt.ylabel('Fluctuation')

sns.scatterplot(data=pc_group, x="Counts", y="Fluctuation")
Out[ ]:
<Axes: title={'center': 'Price Fluctuation by Tweet Count'}, xlabel='Number of Tweets Made', ylabel='Fluctuation'>

A much more interesting graph!

In [ ]:
sns.lmplot(data=pc_group, x="Counts", y="Fluctuation")
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x17c74d4f0>

Here, there does seem to be some positive correlation between tweets made per day and how much the price fluctuated by, so we can conclude that there is likely some relationship between how many tweets are made about a particular stock in a day and that stock's price fluctuation.

In [ ]:
# Sanjit Thangarasu
# Analysis 1: Tweet Volume vs. Stock Volume
# Count the number of tweets per day per stock
tweet_volume = merged_df.groupby(['Date', 'Stock Name']).size().reset_index(name='Tweet Volume')

# Sum the stock volume per day per stock
stock_volume = merged_df.groupby(['Date', 'Stock Name'])['Volume'].sum().reset_index(name='Stock Volume')

# Merge the tweet volume and stock volume dataframes
volume_df = pd.merge(tweet_volume, stock_volume, on=['Date', 'Stock Name'])

# Plot Tweet Volume vs. Stock Volume
plt.figure(figsize=(12, 6))
sns.set(style="darkgrid")
sns.regplot(x='Tweet Volume', y='Stock Volume', data=volume_df, scatter_kws={'alpha':0.5})
plt.title('Tweet Volume vs. Stock Volume')
plt.xlabel('Tweet Volume')
plt.ylabel('Stock Volume')
plt.show()
  • Visualization: The scatter plot with a regression line shows the relationship between tweet volume and stock volume for the merged dataset.
  • Observations:
    • Most data points are clustered at lower tweet volumes (below 100 tweets) and lower stock volumes.
    • A few outliers exist with exceptionally high tweet volumes (over 200) and corresponding higher stock volumes.
    • The regression line indicates a slight positive correlation, suggesting that as tweet volume increases, stock volume tends to increase as well, although this relationship is not very strong.
  • Interpretation: This plot suggests a weak but positive correlation between tweet volume and stock volume. Higher tweet activity might be associated with higher trading activity, but the correlation is not strong enough to infer causation.
In [ ]:
# Analysis 2: Tesla Tweet Volume Over Time and Correlation with Stock Returns

# Filter data for Tesla (TSLA)
tesla_data = merged_df[merged_df['Stock Name'] == 'TSLA'].copy()

# Calculate daily tweet volume for Tesla
tesla_tweet_volume = tesla_data.groupby('Date').size().reset_index(name='Tweet Volume')

# Calculate daily returns for Tesla
tesla_data.loc[:, 'Daily Return'] = tesla_data['Adj Close'].pct_change()
tesla_returns = tesla_data.groupby('Date')['Daily Return'].mean().reset_index()

# Merge daily tweet volume with daily returns for Tesla
tesla_correlation_df = pd.merge(tesla_tweet_volume, tesla_returns, on='Date')

# Plot Tweet Volume vs. Stock Returns for Tesla
fig, ax1 = plt.subplots(figsize=(12, 6))

color = 'tab:blue'
ax1.set_xlabel('Date')
ax1.set_ylabel('Tweet Volume', color=color)
ax1.plot(tesla_correlation_df['Date'], tesla_correlation_df['Tweet Volume'], color=color)
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx()
color = 'tab:red'
ax2.set_ylabel('Daily Return', color=color)
ax2.plot(tesla_correlation_df['Date'], tesla_correlation_df['Daily Return'], color=color)
ax2.tick_params(axis='y', labelcolor=color)

plt.title('Tesla: Tweet Volume and Stock Returns Over Time')
fig.tight_layout()
plt.show()

# Calculate and print correlation for Tesla
tesla_correlation = tesla_correlation_df['Tweet Volume'].corr(tesla_correlation_df['Daily Return'])
print(f"Correlation between Tesla Tweet Volume and Stock Returns: {tesla_correlation}")
Correlation between Tesla Tweet Volume and Stock Returns: -0.0230282294465532
  • Visualization: The dual-axis plot shows Tesla's tweet volume and daily stock returns over time.
    • The blue line represents tweet volume.
    • The red line represents daily stock returns.
  • Observations:
    • Tweet volumes show noticeable spikes on specific dates, indicating periods of high social media activity related to Tesla.
    • Stock returns also fluctuate over time but do not seem to directly correspond with the spikes in tweet volume.
    • The correlation between tweet volume and daily returns is -0.0100, indicating an extremely weak negative correlation.
  • Interpretation: The plot indicates that there is no significant correlation between tweet volume and Tesla's stock returns. Despite periods of high tweet activity, the impact on stock returns is minimal. This suggests that other factors beyond tweet volume are likely more influential in determining Tesla's stock performance.

Primary Analysis¶

To compare our tweets and data, we need to do a quick sentiment analysis first.

In [ ]:
## Sentiment Analysis and Stock Prediction Accuracy with Logistic Regression - Pranay Akula##

def preprocess_text(text):
    text = re.sub(r'\W', ' ', text)
    text = re.sub(r'\s+', ' ', text)
    text = text.lower()
    return text

# Apply preprocessing
merged_df['Processed_Tweet'] = merged_df['Tweet'].apply(preprocess_text)

# Define the vectorizer
vectorizer = TfidfVectorizer(max_features=3000)

# Transform the processed tweets to TF-IDF features
X = vectorizer.fit_transform(merged_df['Processed_Tweet'])

# Using TextBlob for sentiment analysis
merged_df['Predicted_Sentiment'] = merged_df['Tweet'].apply(lambda tweet: TextBlob(tweet).sentiment.polarity)
merged_df['Predicted_Sentiment_Label'] = merged_df['Predicted_Sentiment'].apply(lambda x: 1 if x > 0 else 0 if x == 0 else -1)
print(merged_df.head())
   index        Date                                              Tweet  \
0  80792  2021-09-30  Why can XPeng P5 deliver outstanding performan...   
1  37341  2021-09-30  $TSLA Little teaser, more pictures soon 😍🚀🙌🏻\n...   
2  37340  2021-09-30  UPDATE on Q3 Delivery Estimates:\n\n* FactSet ...   
3  37339  2021-09-30  To set the record straight, my comments yester...   
4  37338  2021-09-30  wow. FSD Beta 10.1 is incredibly good. Not per...   

  Stock Name Company Name        Open        High         Low       Close  \
0       XPEV   XPeng Inc.   35.029999   36.110001   34.816002   35.540001   
1       TSLA  Tesla, Inc.  260.333344  263.043335  258.333344  258.493347   
2       TSLA  Tesla, Inc.  260.333344  263.043335  258.333344  258.493347   
3       TSLA  Tesla, Inc.  260.333344  263.043335  258.333344  258.493347   
4       TSLA  Tesla, Inc.  260.333344  263.043335  258.333344  258.493347   

    Adj Close      Volume  Fluctuation  Counts  \
0   35.540001   6461500.0     3.716678       1   
1  258.493347  53868000.0     1.823222       1   
2  258.493347  53868000.0     1.823222       1   
3  258.493347  53868000.0     1.823222       1   
4  258.493347  53868000.0     1.823222       1   

                                     Processed_Tweet  Predicted_Sentiment  \
0  why can xpeng p5 deliver outstanding performan...             0.187500   
1   tsla little teaser more pictures soon https t...             0.156250   
2  update on q3 delivery estimates factset 204k w...             0.000000   
3  to set the record straight my comments yesterd...             0.266667   
4  wow fsd beta 10 1 is incredibly good not perfe...             0.162500   

   Predicted_Sentiment_Label  
0                          1  
1                          1  
2                          0  
3                          1  
4                          1  

With that done, we can get into trying to train a model that can accurately predict stock prices based on our tweet data. First, why don't we train a linear regression model?

Note: You don't have to do this step. This is just an example, for educational purposes.

Here, we will just be applying our model to TSLA, as it has the most available tweets.

In [ ]:
# Prepare the data for price prediction
merged_df['Fluctuation'] = abs(((merged_df['High'] - merged_df['Low']) / merged_df['Low']) * 100)
price_df_TSLA = merged_df[merged_df['Stock Name'] == 'TSLA'].groupby('Date').agg({
    'Predicted_Sentiment': 'mean',
    'Fluctuation': 'mean'
}).reset_index()

# Feature and target variables
X_price = price_df_TSLA[['Predicted_Sentiment']]
y_price = price_df_TSLA['Fluctuation']

# Split data into train and test sets for price prediction
X_train_price, X_test_price, y_train_price, y_test_price = train_test_split(X_price, y_price, test_size=0.2, random_state=42)

# Train the Linear Regression model
price_model = LinearRegression()
price_model.fit(X_train_price, y_train_price)

# Predict stock price fluctuation
price_df_TSLA['Predicted_Price'] = price_model.predict(price_df_TSLA[['Predicted_Sentiment']])

# Plot actual vs. predicted prices
plt.figure(figsize=(14, 7))
plt.plot(price_df_TSLA['Date'], price_df_TSLA['Fluctuation'], label='Actual Price', color='b')
plt.plot(price_df_TSLA['Date'], price_df_TSLA['Predicted_Price'], label='Predicted Price', color='r', linestyle='--')
#Good old wordy title, back at it again
plt.title('TSLA Actual vs. Predicted Percent Difference Between Daily Highest and Lowest Prices')
plt.xlabel('Date')
plt.ylabel('Percent difference')
plt.xticks(rotation=45)
plt.legend()
plt.grid(True)
plt.show()

Wow! It didn't work at all!

So, what went wrong? Linear regression models work by plotting a dependant variable against a (or multiple) independant variables. In our case, our x axis is time, and stock prices fluctuate across time; therefore, our linear regression model, which is treating each day like a completely new data point, gets confused.

To solve this problem, we can train a model that does treat time as a continuous block: an LSTM model.

Primary Analysis (For Real This Time)¶

A Long Short-Term Memory (model) is a type of neural network that can handle sequential data. For example, data recorded over a continuous period of time, which is exactly what stock data is. So, to tackle our particular problem, we must pivot away from our regression model and start building a neural network.

Before we can start, we need to preprocess our data. First, we use StandardScaler from SciKit Learn to normalize our data.

Next, we apply Principal Component Analysis (PCA) to simplify our data. It looks great right now, but there's 9 whole features- far too many.

In [ ]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import os

# Function to preprocess each stock's data
def preprocess_stock_data(df, stock_name):
    stock_df = df[df['Stock Name'] == stock_name].copy()
    columns_to_keep = ['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'Fluctuation', 'Predicted_Sentiment']
    stock_df = stock_df[columns_to_keep]
    stock_df.set_index('Date', inplace=True)
    scaler = StandardScaler()
    normalized_stock_df = pd.DataFrame(scaler.fit_transform(stock_df), columns=stock_df.columns, index=stock_df.index)
    return normalized_stock_df

# Get unique stock names
stock_names = merged_df['Stock Name'].unique()

# Dictionary to hold the preprocessed data for each stock
preprocessed_data = {}
for stock_name in stock_names:
    preprocessed_data[stock_name] = preprocess_stock_data(merged_df, stock_name)

# Apply PCA
def apply_pca(df: pd.DataFrame, variance_threshold: float = 0.95) -> pd.DataFrame:
    pca = PCA()
    pca.fit(df)

    # Select the number of components that explain the desired variance
    cumulative_variance = pca.explained_variance_ratio_.cumsum()
    num_components = next(i for i, total_variance in enumerate(cumulative_variance) if total_variance >= variance_threshold) + 1

    # Apply PCA with the selected number of components
    pca = PCA(n_components=num_components)
    transformed_data = pca.fit_transform(df)

    # Convert transformed data back to a DataFrame
    pca_df = pd.DataFrame(transformed_data, index=df.index, columns=[f'PC{i+1}' for i in range(num_components)])
    return pca_df, cumulative_variance

# Plot cumulative explained variance ratios for all tickers
def plot_cumulative_variances(cumulative_variances: dict):
    plt.figure(figsize=(15, 8))
    for ticker, cumulative_variance in cumulative_variances.items():
        plt.plot(range(1, len(cumulative_variance) + 1), cumulative_variance, marker='o', label=ticker)
    plt.xlabel('Number of Components')
    plt.ylabel('Cumulative Explained Variance')
    plt.title('Explained Variance by Number of Components for Different Tickers')
    plt.axhline(y=0.95, color='r', linestyle='--')
    plt.legend()

    save_path = 'outputs/graphs/cumulative_explained_variance.sample.png'
    os.makedirs(os.path.dirname(save_path), exist_ok=True)
    plt.savefig(save_path)

    plt.show()
    plt.close()

# Apply PCA to each ticker in the stock data
def apply_pca_stock_data(stock_data: dict, variance_threshold: float = 0.95) -> dict:
    tickers = list(stock_data.keys())
    cumulative_variances = {}
    for ticker in tickers:
        stock_data[ticker]['pca_data'], cumulative_variance = apply_pca(stock_data[ticker]['normalized_data'], variance_threshold)
        cumulative_variances[ticker] = cumulative_variance
        print(f"PCA applied to {ticker} ✅")

    return stock_data, cumulative_variances

# Initialize stock data
stock_data = {stock_name: {'normalized_data': data} for stock_name, data in preprocessed_data.items()}

# Apply PCA to stock data
stock_data, cumulative_variances = apply_pca_stock_data(stock_data)

# Plot the cumulative explained variance ratios
plot_cumulative_variances(cumulative_variances)
PCA applied to XPEV ✅
PCA applied to TSLA ✅
PCA applied to NIO ✅
PCA applied to DIS ✅
PCA applied to TSM ✅
PCA applied to AAPL ✅
PCA applied to AMD ✅
PCA applied to GOOG ✅
PCA applied to AMZN ✅
PCA applied to META ✅
PCA applied to PG ✅
PCA applied to MSFT ✅
PCA applied to NFLX ✅
PCA applied to CRM ✅
PCA applied to ZS ✅
PCA applied to ENPH ✅
PCA applied to PYPL ✅
PCA applied to COST ✅
PCA applied to BA ✅
PCA applied to KO ✅
PCA applied to F ✅
PCA applied to INTC ✅
PCA applied to BX ✅
PCA applied to NOC ✅
PCA applied to VZ ✅

From PCA and our lovely little elbow graph, we can see that the optimal amount of features for our graphs lies somewhere between 3-4; it varies for every stock. Speaking of variance between stocks- every stock has a different amount of tweets, data, and patterns associated with themselves. So, we must train a new model for every single one of our stocks.

In [ ]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

# Function to create sequences for LSTM
def create_sequences(data, seq_length):
    sequences = []
    for i in range(len(data) - seq_length):
        sequences.append(data[i:i + seq_length])
    return np.array(sequences)

sequence_length = 60  # Length of the sequences for LSTM

# Prepare the data for each stock
lstm_data = {}
for stock_name, data in stock_data.items():
    pca_data = data['pca_data']

    # Create sequences
    sequences = create_sequences(pca_data.values, sequence_length)

    if sequences.size == 0:
        print(f"Skipping {stock_name} due to insufficient data for sequences.")
        continue

    # Split into features and target
    X = sequences[:, :-1]
    y = sequences[:, -1, 0]  # Predicting the first principal component as target

    # Split into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    lstm_data[stock_name] = {
        'X_train': X_train,
        'X_test': X_test,
        'y_train': y_train,
        'y_test': y_test
    }

# Display the shape of the training data for all stocks
print("\nShape of the training data for each stock:")
for stock_name, data in lstm_data.items():
    print(f"Training data for {stock_name}: X_train shape: {data['X_train'].shape}, y_train shape: {data['y_train'].shape}")
Skipping F due to insufficient data for sequences.
Skipping BX due to insufficient data for sequences.
Skipping NOC due to insufficient data for sequences.

Shape of the training data for each stock:
Training data for XPEV: X_train shape: (88, 59, 3), y_train shape: (88,)
Training data for TSLA: X_train shape: (23974, 59, 3), y_train shape: (23974,)
Training data for NIO: X_train shape: (1777, 59, 3), y_train shape: (1777,)
Training data for DIS: X_train shape: (364, 59, 3), y_train shape: (364,)
Training data for TSM: X_train shape: (6008, 59, 3), y_train shape: (6008,)
Training data for AAPL: X_train shape: (3256, 59, 3), y_train shape: (3256,)
Training data for AMD: X_train shape: (1388, 59, 3), y_train shape: (1388,)
Training data for GOOG: X_train shape: (794, 59, 3), y_train shape: (794,)
Training data for AMZN: X_train shape: (2624, 59, 3), y_train shape: (2624,)
Training data for META: X_train shape: (1805, 59, 4), y_train shape: (1805,)
Training data for PG: X_train shape: (2624, 59, 4), y_train shape: (2624,)
Training data for MSFT: X_train shape: (2624, 59, 3), y_train shape: (2624,)
Training data for NFLX: X_train shape: (1123, 59, 3), y_train shape: (1123,)
Training data for CRM: X_train shape: (90, 59, 4), y_train shape: (90,)
Training data for ZS: X_train shape: (66, 59, 3), y_train shape: (66,)
Training data for ENPH: X_train shape: (72, 59, 3), y_train shape: (72,)
Training data for PYPL: X_train shape: (496, 59, 3), y_train shape: (496,)
Training data for COST: X_train shape: (176, 59, 3), y_train shape: (176,)
Training data for BA: X_train shape: (173, 59, 3), y_train shape: (173,)
Training data for KO: X_train shape: (120, 59, 4), y_train shape: (120,)
Training data for INTC: X_train shape: (150, 59, 4), y_train shape: (150,)
Training data for VZ: X_train shape: (17, 59, 3), y_train shape: (17,)

As we can see, not all of our stocks have enough tweets to train a good, functional model. Unfortunately, this means we have to drop F, BX, and NOX.

For our remaining 22 stocks we can start training our LSTMs.

In [ ]:
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout
from keras.regularizers import l2
from keras.callbacks import EarlyStopping

# Function to build LSTM model
def build_lstm_model(input_shape):
    model = Sequential()

    # First LSTM layer with L2 regularization and dropout
    model.add(LSTM(50, return_sequences=True, input_shape=input_shape, kernel_regularizer=l2(0.001)))
    model.add(Dropout(0.3))

    # Second LSTM layer with L2 regularization and dropout
    model.add(LSTM(50, return_sequences=False, kernel_regularizer=l2(0.001)))
    model.add(Dropout(0.3))

    # Fully connected layers
    model.add(Dense(25, kernel_regularizer=l2(0.001)))
    model.add(Dense(1))

    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

# Training the LSTM model for each stock
for stock_name, data in lstm_data.items():
    X_train, y_train = data['X_train'], data['y_train']
    input_shape = (X_train.shape[1], X_train.shape[2])

    model = build_lstm_model(input_shape)

    # Early stopping callback
    early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

    # Train the model
    model.fit(X_train, y_train, batch_size=32, epochs=10, validation_split=0.2)

    lstm_data[stock_name]['model'] = model
    print(f"LSTM model trained for {stock_name} ✅")
Epoch 1/10
/Users/sanjit/Documents/College/CMSC320/venv/lib/python3.12/site-packages/keras/src/layers/rnn/rnn.py:204: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
3/3 ━━━━━━━━━━━━━━━━━━━━ 1s 90ms/step - loss: 3.0907 - val_loss: 2.6106
Epoch 2/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.9250 - val_loss: 1.3075
Epoch 3/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.1733 - val_loss: 0.9958
Epoch 4/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.3260 - val_loss: 1.1186
Epoch 5/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.3693 - val_loss: 1.0492
Epoch 6/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.4616 - val_loss: 1.0219
Epoch 7/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.0030 - val_loss: 1.2236
Epoch 8/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.1705 - val_loss: 1.4546
Epoch 9/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.1504 - val_loss: 1.4405
Epoch 10/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.2464 - val_loss: 1.2308
LSTM model trained for XPEV ✅
Epoch 1/10
600/600 ━━━━━━━━━━━━━━━━━━━━ 11s 17ms/step - loss: 0.5678 - val_loss: 0.0935
Epoch 2/10
600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.1292 - val_loss: 0.0592
Epoch 3/10
600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.1032 - val_loss: 0.0440
Epoch 4/10
600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0942 - val_loss: 0.0420
Epoch 5/10
600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0872 - val_loss: 0.0305
Epoch 6/10
600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0818 - val_loss: 0.0322
Epoch 7/10
600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0750 - val_loss: 0.0281
Epoch 8/10
600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0776 - val_loss: 0.0265
Epoch 9/10
600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0757 - val_loss: 0.0186
Epoch 10/10
600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0662 - val_loss: 0.0508
LSTM model trained for TSLA ✅
Epoch 1/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 2s 20ms/step - loss: 1.9969 - val_loss: 0.2502
Epoch 2/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - loss: 0.3074 - val_loss: 0.1894
Epoch 3/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2674 - val_loss: 0.1805
Epoch 4/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2416 - val_loss: 0.1971
Epoch 5/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2292 - val_loss: 0.1769
Epoch 6/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2260 - val_loss: 0.1513
Epoch 7/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2002 - val_loss: 0.1609
Epoch 8/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2013 - val_loss: 0.1466
Epoch 9/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2105 - val_loss: 0.1378
Epoch 10/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1883 - val_loss: 0.1373
LSTM model trained for NIO ✅
Epoch 1/10
10/10 ━━━━━━━━━━━━━━━━━━━━ 1s 36ms/step - loss: 3.5736 - val_loss: 0.5421
Epoch 2/10
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.6202 - val_loss: 0.4679
Epoch 3/10
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.5172 - val_loss: 0.5894
Epoch 4/10
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.5106 - val_loss: 0.3803
Epoch 5/10
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.5025 - val_loss: 0.4082
Epoch 6/10
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.4206 - val_loss: 0.3455
Epoch 7/10
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.4255 - val_loss: 0.3188
Epoch 8/10
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3824 - val_loss: 0.3522
Epoch 9/10
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3344 - val_loss: 0.3013
Epoch 10/10
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3627 - val_loss: 0.3098
LSTM model trained for DIS ✅
Epoch 1/10
151/151 ━━━━━━━━━━━━━━━━━━━━ 4s 18ms/step - loss: 0.9332 - val_loss: 0.1288
Epoch 2/10
151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1886 - val_loss: 0.1100
Epoch 3/10
151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1566 - val_loss: 0.0820
Epoch 4/10
151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1451 - val_loss: 0.0841
Epoch 5/10
151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1370 - val_loss: 0.0925
Epoch 6/10
151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1277 - val_loss: 0.0599
Epoch 7/10
151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1170 - val_loss: 0.0580
Epoch 8/10
151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1146 - val_loss: 0.0529
Epoch 9/10
151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1131 - val_loss: 0.0478
Epoch 10/10
151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1009 - val_loss: 0.0491
LSTM model trained for TSM ✅
Epoch 1/10
82/82 ━━━━━━━━━━━━━━━━━━━━ 2s 18ms/step - loss: 1.2304 - val_loss: 0.1895
Epoch 2/10
82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2618 - val_loss: 0.1649
Epoch 3/10
82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2488 - val_loss: 0.1655
Epoch 4/10
82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2209 - val_loss: 0.1326
Epoch 5/10
82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2053 - val_loss: 0.1195
Epoch 6/10
82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1929 - val_loss: 0.1103
Epoch 7/10
82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1730 - val_loss: 0.1073
Epoch 8/10
82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1656 - val_loss: 0.1096
Epoch 9/10
82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1582 - val_loss: 0.0960
Epoch 10/10
82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1560 - val_loss: 0.0948
LSTM model trained for AAPL ✅
Epoch 1/10
35/35 ━━━━━━━━━━━━━━━━━━━━ 2s 20ms/step - loss: 2.0555 - val_loss: 0.2589
Epoch 2/10
35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - loss: 0.3709 - val_loss: 0.2013
Epoch 3/10
35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - loss: 0.2676 - val_loss: 0.1635
Epoch 4/10
35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2617 - val_loss: 0.1567
Epoch 5/10
35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2589 - val_loss: 0.2192
Epoch 6/10
35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2543 - val_loss: 0.1821
Epoch 7/10
35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2474 - val_loss: 0.1338
Epoch 8/10
35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2202 - val_loss: 0.1534
Epoch 9/10
35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2241 - val_loss: 0.1248
Epoch 10/10
35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1961 - val_loss: 0.1260
LSTM model trained for AMD ✅
Epoch 1/10
20/20 ━━━━━━━━━━━━━━━━━━━━ 1s 24ms/step - loss: 2.5654 - val_loss: 0.4470
Epoch 2/10
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.5001 - val_loss: 0.3244
Epoch 3/10
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3676 - val_loss: 0.2524
Epoch 4/10
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3511 - val_loss: 0.2333
Epoch 5/10
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3190 - val_loss: 0.2569
Epoch 6/10
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.3194 - val_loss: 0.2215
Epoch 7/10
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.3033 - val_loss: 0.2050
Epoch 8/10
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.2852 - val_loss: 0.2078
Epoch 9/10
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.2780 - val_loss: 0.2202
Epoch 10/10
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.2587 - val_loss: 0.1900
LSTM model trained for GOOG ✅
Epoch 1/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 2s 18ms/step - loss: 1.4709 - val_loss: 0.1854
Epoch 2/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2833 - val_loss: 0.1636
Epoch 3/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2445 - val_loss: 0.1591
Epoch 4/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2239 - val_loss: 0.1426
Epoch 5/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2286 - val_loss: 0.1242
Epoch 6/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1986 - val_loss: 0.1189
Epoch 7/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2094 - val_loss: 0.1110
Epoch 8/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1809 - val_loss: 0.1071
Epoch 9/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1926 - val_loss: 0.1056
Epoch 10/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1632 - val_loss: 0.0965
LSTM model trained for AMZN ✅
Epoch 1/10
46/46 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - loss: 2.0513 - val_loss: 0.2529
Epoch 2/10
46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - loss: 0.3119 - val_loss: 0.2028
Epoch 3/10
46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2610 - val_loss: 0.1687
Epoch 4/10
46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2500 - val_loss: 0.1576
Epoch 5/10
46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2300 - val_loss: 0.1455
Epoch 6/10
46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2202 - val_loss: 0.1469
Epoch 7/10
46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2144 - val_loss: 0.1412
Epoch 8/10
46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2148 - val_loss: 0.1284
Epoch 9/10
46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2048 - val_loss: 0.1187
Epoch 10/10
46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1926 - val_loss: 0.1114
LSTM model trained for META ✅
Epoch 1/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 2s 18ms/step - loss: 1.6178 - val_loss: 0.2280
Epoch 2/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2838 - val_loss: 0.1893
Epoch 3/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2326 - val_loss: 0.1694
Epoch 4/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2244 - val_loss: 0.1530
Epoch 5/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1927 - val_loss: 0.1484
Epoch 6/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1913 - val_loss: 0.1354
Epoch 7/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1711 - val_loss: 0.1625
Epoch 8/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1770 - val_loss: 0.1210
Epoch 9/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1658 - val_loss: 0.1122
Epoch 10/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1605 - val_loss: 0.1210
LSTM model trained for PG ✅
Epoch 1/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 4s 19ms/step - loss: 1.2463 - val_loss: 0.2040
Epoch 2/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2666 - val_loss: 0.1746
Epoch 3/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2413 - val_loss: 0.1598
Epoch 4/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2328 - val_loss: 0.1450
Epoch 5/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2077 - val_loss: 0.1329
Epoch 6/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2067 - val_loss: 0.1264
Epoch 7/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.1747 - val_loss: 0.1183
Epoch 8/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.1729 - val_loss: 0.1175
Epoch 9/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.1747 - val_loss: 0.1503
Epoch 10/10
66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.1767 - val_loss: 0.1025
LSTM model trained for MSFT ✅
Epoch 1/10
29/29 ━━━━━━━━━━━━━━━━━━━━ 2s 21ms/step - loss: 2.4260 - val_loss: 0.2800
Epoch 2/10
29/29 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3315 - val_loss: 0.1963
Epoch 3/10
29/29 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3106 - val_loss: 0.1864
Epoch 4/10
29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2994 - val_loss: 0.1776
Epoch 5/10
29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2726 - val_loss: 0.1660
Epoch 6/10
29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2527 - val_loss: 0.2053
Epoch 7/10
29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2817 - val_loss: 0.1633
Epoch 8/10
29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2144 - val_loss: 0.1641
Epoch 9/10
29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2212 - val_loss: 0.1448
Epoch 10/10
29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.1985 - val_loss: 0.1420
LSTM model trained for NFLX ✅
Epoch 1/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 1s 86ms/step - loss: 2.9918 - val_loss: 1.8570
Epoch 2/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.8502 - val_loss: 0.8739
Epoch 3/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.9794 - val_loss: 0.5343
Epoch 4/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 0.7058 - val_loss: 0.8388
Epoch 5/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.8332 - val_loss: 0.6840
Epoch 6/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.6862 - val_loss: 0.4755
Epoch 7/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.5296 - val_loss: 0.5008
Epoch 8/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.5303 - val_loss: 0.5387
Epoch 9/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 0.5172 - val_loss: 0.5034
Epoch 10/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.5216 - val_loss: 0.4501
LSTM model trained for CRM ✅
Epoch 1/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 1s 157ms/step - loss: 5.6479 - val_loss: 3.4710
Epoch 2/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 29ms/step - loss: 4.5054 - val_loss: 2.4362
Epoch 3/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 27ms/step - loss: 3.3327 - val_loss: 1.5209
Epoch 4/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - loss: 2.5439 - val_loss: 0.7591
Epoch 5/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step - loss: 1.4928 - val_loss: 0.2839
Epoch 6/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 26ms/step - loss: 1.3924 - val_loss: 0.2001
Epoch 7/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 26ms/step - loss: 1.2888 - val_loss: 0.2640
Epoch 8/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step - loss: 1.3822 - val_loss: 0.2580
Epoch 9/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 26ms/step - loss: 1.3037 - val_loss: 0.2082
Epoch 10/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 31ms/step - loss: 1.0285 - val_loss: 0.1851
LSTM model trained for ZS ✅
Epoch 1/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 1s 159ms/step - loss: 3.5716 - val_loss: 2.5184
Epoch 2/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 31ms/step - loss: 2.4558 - val_loss: 1.5588
Epoch 3/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step - loss: 1.5159 - val_loss: 0.8053
Epoch 4/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step - loss: 0.7099 - val_loss: 0.3442
Epoch 5/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 31ms/step - loss: 0.5432 - val_loss: 0.2808
Epoch 6/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 27ms/step - loss: 0.7651 - val_loss: 0.3602
Epoch 7/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 30ms/step - loss: 0.7418 - val_loss: 0.3209
Epoch 8/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step - loss: 0.7737 - val_loss: 0.2398
Epoch 9/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 30ms/step - loss: 0.4235 - val_loss: 0.2199
Epoch 10/10
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step - loss: 0.3622 - val_loss: 0.2555
LSTM model trained for ENPH ✅
Epoch 1/10
13/13 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - loss: 1.8366 - val_loss: 0.4085
Epoch 2/10
13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.4418 - val_loss: 0.2979
Epoch 3/10
13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3869 - val_loss: 0.1930
Epoch 4/10
13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.2718 - val_loss: 0.1810
Epoch 5/10
13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.2789 - val_loss: 0.1738
Epoch 6/10
13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.2457 - val_loss: 0.1605
Epoch 7/10
13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 0.2434 - val_loss: 0.1596
Epoch 8/10
13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.2528 - val_loss: 0.1526
Epoch 9/10
13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.2411 - val_loss: 0.1511
Epoch 10/10
13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.2300 - val_loss: 0.1511
LSTM model trained for PYPL ✅
Epoch 1/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - loss: 6.6852 - val_loss: 4.4975
Epoch 2/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 4.4587 - val_loss: 2.7578
Epoch 3/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.9221 - val_loss: 1.8071
Epoch 4/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.4614 - val_loss: 1.3477
Epoch 5/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.5815 - val_loss: 1.0546
Epoch 6/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.5837 - val_loss: 0.8666
Epoch 7/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.9674 - val_loss: 0.8316
Epoch 8/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 0.9314 - val_loss: 0.7287
Epoch 9/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.8095 - val_loss: 0.6663
Epoch 10/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.7641 - val_loss: 0.6353
LSTM model trained for COST ✅
Epoch 1/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - loss: 2.8471 - val_loss: 1.7034
Epoch 2/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.3454 - val_loss: 1.1954
Epoch 3/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 1.2284 - val_loss: 1.1602
Epoch 4/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.1381 - val_loss: 1.0356
Epoch 5/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.0708 - val_loss: 1.0837
Epoch 6/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 23ms/step - loss: 1.0017 - val_loss: 1.0039
Epoch 7/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.0020 - val_loss: 0.8878
Epoch 8/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 0.7625 - val_loss: 0.8345
Epoch 9/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.8626 - val_loss: 0.7987
Epoch 10/10
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.8028 - val_loss: 0.7436
LSTM model trained for BA ✅
Epoch 1/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 1s 86ms/step - loss: 2.6579 - val_loss: 1.9307
Epoch 2/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 1.8682 - val_loss: 1.7814
Epoch 3/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 1.4565 - val_loss: 2.0554
Epoch 4/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 1.2917 - val_loss: 2.1096
Epoch 5/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 1.2007 - val_loss: 1.8012
Epoch 6/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.9677 - val_loss: 1.5328
Epoch 7/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 0.9090 - val_loss: 1.4238
Epoch 8/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.9343 - val_loss: 1.4146
Epoch 9/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 0.7871 - val_loss: 1.4574
Epoch 10/10
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.6601 - val_loss: 1.4897
LSTM model trained for KO ✅
Epoch 1/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 1s 64ms/step - loss: 4.1732 - val_loss: 3.3425
Epoch 2/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.5871 - val_loss: 1.8256
Epoch 3/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.3765 - val_loss: 0.9404
Epoch 4/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 0.8099 - val_loss: 0.9388
Epoch 5/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 0.7717 - val_loss: 0.9055
Epoch 6/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.7025 - val_loss: 0.6711
Epoch 7/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 0.5913 - val_loss: 0.5997
Epoch 8/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 0.5180 - val_loss: 0.6037
Epoch 9/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 0.4467 - val_loss: 0.5381
Epoch 10/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.4476 - val_loss: 0.4319
LSTM model trained for INTC ✅
Epoch 1/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 1s/step - loss: 11.8597 - val_loss: 12.0071
Epoch 2/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step - loss: 10.4299 - val_loss: 10.7360
Epoch 3/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step - loss: 9.8812 - val_loss: 9.5309
Epoch 4/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step - loss: 8.5876 - val_loss: 8.3641
Epoch 5/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - loss: 7.1955 - val_loss: 7.2195
Epoch 6/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step - loss: 6.8852 - val_loss: 6.0835
Epoch 7/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 34ms/step - loss: 5.4592 - val_loss: 4.9657
Epoch 8/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step - loss: 4.6330 - val_loss: 3.8846
Epoch 9/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step - loss: 3.7057 - val_loss: 2.8656
Epoch 10/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - loss: 2.6467 - val_loss: 1.9476
LSTM model trained for VZ ✅

Here we can see the shape of the data as we go through the model:

In [ ]:
lstm_model = lstm_data['TSLA']['model']
print("LSTM model summary for TSLA:")
lstm_model.summary()
LSTM model summary for TSLA:
Model: "sequential_89"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ lstm_178 (LSTM)                 │ (None, 59, 50)         │        10,800 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_178 (Dropout)           │ (None, 59, 50)         │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_179 (LSTM)                 │ (None, 50)             │        20,200 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_179 (Dropout)           │ (None, 50)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_178 (Dense)               │ (None, 25)             │         1,275 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_179 (Dense)               │ (None, 1)              │            26 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 96,905 (378.54 KB)
 Trainable params: 32,301 (126.18 KB)
 Non-trainable params: 0 (0.00 B)
 Optimizer params: 64,604 (252.36 KB)
In [ ]:
from sklearn.metrics import mean_squared_error

# Evaluate the LSTM model for each stock
for stock_name, data in lstm_data.items():
    X_test, y_test = data['X_test'], data['y_test']
    model = data['model']

    # Make predictions
    predictions = model.predict(X_test)

    # Calculate RMSE
    rmse = np.sqrt(mean_squared_error(y_test, predictions))
    print(f"RMSE for {stock_name}: {rmse:.4f}")
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 116ms/step
RMSE for XPEV: 0.7460
188/188 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step
RMSE for TSLA: 0.2047
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step
RMSE for NIO: 0.2460
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 54ms/step
RMSE for DIS: 0.4599
47/47 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step
RMSE for TSM: 0.1239
26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step
RMSE for AAPL: 0.2290
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step
RMSE for AMD: 0.2310
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step
RMSE for GOOG: 0.3587
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step
RMSE for AMZN: 0.1984
15/15 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step
RMSE for META: 0.2207
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step
RMSE for PG: 0.2244
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step
RMSE for MSFT: 0.2111
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step
RMSE for NFLX: 0.2363
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 107ms/step
RMSE for CRM: 0.5504
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 149ms/step
RMSE for ZS: 0.5845
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 100ms/step
RMSE for ENPH: 0.3426
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step
RMSE for PYPL: 0.2401
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 97ms/step
RMSE for COST: 0.5712
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 98ms/step
RMSE for BA: 0.8761
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 99ms/step
RMSE for KO: 0.9238
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 97ms/step
RMSE for INTC: 0.3939
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 97ms/step
RMSE for VZ: 1.4876
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error

# Evaluate the LSTM model for each stock and plot results
for stock_name, data in lstm_data.items():
    X_test, y_test = data['X_test'], data['y_test']
    model = data['model']

    # Make predictions
    predictions = model.predict(X_test)

    # Calculate RMSE
    rmse = np.sqrt(mean_squared_error(y_test, predictions))
    print(f"RMSE for {stock_name}: {rmse:.4f}")

    # Plot predicted vs actual results
    plt.figure(figsize=(14, 7))
    if len(y_test) > 100:
        plt.plot(y_test[:100], label='Actual')
        plt.plot(predictions[:100], label='Predicted')
        plt.title(f'Predicted vs Actual for {stock_name} (Capped at 100 data points)')
    else:
        plt.plot(y_test, label='Actual')
        plt.plot(predictions, label='Predicted')
        plt.title(f'Predicted vs Actual for {stock_name}')
    plt.xlabel('Time')
    plt.ylabel('First Principal Component')
    plt.legend()
    plt.show()
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step
RMSE for XPEV: 0.7460
188/188 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step
RMSE for TSLA: 0.2047
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
RMSE for NIO: 0.2460
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step 
RMSE for DIS: 0.4599
47/47 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
RMSE for TSM: 0.1239
26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step
RMSE for AAPL: 0.2290
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step 
RMSE for AMD: 0.2310
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step 
RMSE for GOOG: 0.3587
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
RMSE for AMZN: 0.1984
15/15 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
RMSE for META: 0.2207
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step
RMSE for PG: 0.2244
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
RMSE for MSFT: 0.2111
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step 
RMSE for NFLX: 0.2363
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step
RMSE for CRM: 0.5504
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step
RMSE for ZS: 0.5845
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step
RMSE for ENPH: 0.3426
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step 
RMSE for PYPL: 0.2401
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step 
RMSE for COST: 0.5712
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step 
RMSE for BA: 0.8761
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step
RMSE for KO: 0.9238
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step
RMSE for INTC: 0.3939
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step
RMSE for VZ: 1.4876

After a lot of training, we can see that our models are doing very, very well when tested with the loss functions! Unfortunately, when we look at the graphs, they are also very, very overfitted.

What does this mean? Overfitting can be caused by a bunch of different reasons (too little data, too complex a model, etc.). In this case though, it's probably because there's no patterns to be found at all. We tried a lot of things (adding L2 regularization, adding early stopping, training everything a second time, asking the model very nicely), and none of them helped significantly.

Conclusions¶

In conclusion, using sentiment analysis to predict stock prices doesn't work. Unfortunate. But that doesn't mean this tutorial is useless! Just because we can't use tweets and social media posts to game the stock market doesn't mean our model can't be applied to other things- just raw stock data, or even completely different datasets. There's plenty of collections of data that span long periods of time, and a lot of other conclusions to be drawn from those datasets.

Additionally there were some other things we observed without using our model- tweet volume has a positive correlation with stock volume traded in a day, price changes have a slight positive correlation with number of tweets made per day, and people really, really like tweeting about Tesla.